In [ ]:
%%HTML
<style>
.container { width:100% } 
</style>

Gradient Ascent

The function find_maximum that is defined below takes four arguments:

  • f is a function of the form $\texttt{f}: \mathbb{R}^n \rightarrow \mathbb{R}$. It is assumed that the function f is convex and therefore there is only one global maximum.
  • gradF is the gradient of the function f.
  • start is a numpy array of numbers that is used to start the search for a maximum.
  • eps is a small floating point number. This number controls the precision. If the values of f change less than eps, then the algorithm stops.

The function find_maximum returns a triple of values of the form $$ (x_{max}, \texttt{fx}, \texttt{cnt}) $$

  1. $x_{max}$ is the position of the maximum,
  2. $\texttt{fx}$ is equal to $\texttt{f}(x_{max})$,
  3. $\texttt{cnt}$ is the number of iterations.

The algorithms computes a sequence $(x_n)_n$ that is defined inductively:

  1. $x_0 := \texttt{start}$,
  2. $x_{n+1} := x_n + \alpha \cdot \nabla f(x_n)$.

The algorithm given below adjusts the learning rate $\alpha$ dynamically: If $f(x_{n+1}) > f(x_n)$, then the learning rate alpha is increased by a factor of $1.2$. Otherwise, the learning rate is decreased by a factor of $\frac{1}{2}$. This way, the algorithm determines the optimal learning rate by itself.


In [ ]:
def findMaximum(f, gradF, start, eps):
    x     = start
    fx    = f(x)
    alpha = 1.0   # learning rate
    cnt   = 0     # number of iterations
    while True:
        cnt += 1
        xOld, fOld = x, fx
        x  += alpha * gradF(x)
        fx  = f(x)
        print(f'cnt = {cnt}, f({x}) = {fx}')
        print(f'gradient = {gradF(x)}')
        if abs(x - xOld) <= abs(x) * eps:
            return x, fx, cnt            
        if fx <= fOld:    # f has not increased, learning rate too high
            alpha *= 0.5
            print(f'decrementing: alpha = {alpha}')
            x, fx = xOld, fOld
            continue
        else:             # f has increased
            alpha *= 1.2
            print(f'incrementing: alpha = {alpha}')

In [ ]:
import numpy as np

We will try to find the maximum of the function $$ f(x) := \sin(x) - \frac{x^2}{2} $$


In [ ]:
def f(x):
    return np.sin(x) - x**2 / 2

Let us plot this function.


In [ ]:
import matplotlib.pyplot as plt
import seaborn           as sns

In [ ]:
X = np.arange(-0.5, 1.8, 0.01)
Y = f(X)
plt.figure(figsize=(15, 10))
sns.set(style='darkgrid')
plt.title('lambda x: sin(x) - x**2/2')
plt.axvline(x=0.0, c='k')
plt.axhline(y=0.0, c='k')
plt.xlabel('x')
plt.ylabel('y')
plt.xticks(np.arange(-0.5, 1.81, step=0.1))
plt.plot(X, Y, color='b')

Clearly, this function has a maximum somewhere between 0.7 and 0.8. Let us use gradient ascent to find it. In order to do so, we have to provide the derivative of this function. We have $$ \frac{\mathrm{d}f}{\mathrm{d}x} = \cos(x) - x. $$


In [ ]:
def fs(x):
    return np.cos(x) - x

Let us plot the derivative together with the function.


In [ ]:
X2 = np.arange(0.4, 1.1, 0.01)
Ys = fs(X2)
plt.figure(figsize=(15, 10))
sns.set(style='darkgrid')
plt.title('lambda x: sin(x) - x**2/2 and its derivative')
plt.axvline(x=0.0, c='k')
plt.axhline(y=0.0, c='k')
plt.xlabel('x')
plt.ylabel('y')
plt.xticks(np.arange(-0.5, 1.81, step=0.1))
plt.yticks(np.arange(-0.6, 0.61, step=0.1))
plt.plot(X, Y, color='b')
plt.plot(X2, Ys, color='r')

In [ ]:
x_max, _, _ = findMaximum(f, fs, 0.0, 1e-15)

In [ ]:
x_max

The maximum seems to be at $x \approx 0.739085$. Let's check the derivative at this position.


In [ ]:
fs(x_max)

In [ ]: